{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "524e2ff8",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/vector_stores/Elasticsearch_demo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "307804a3-c02b-4a57-ac0d-172c30ddc851",
   "metadata": {},
   "source": [
    "# Elasticsearch\n",
    "\n",
    ">[Elasticsearch](http://www.github.com/elastic/elasticsearch) is a search database, that supports full text and vector searches.  \n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "b5331b6b",
   "metadata": {},
   "source": [
    "## Basic Example\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f3aaf790",
   "metadata": {},
   "source": [
    "In this basic example, we take the a Paul Graham essay, split it into chunks, embed it using an open-source embedding model, load it into Elasticsearch, and then query it. For an example using different retrieval strategies see [Elasticsearch Vector Store](https://docs.llamaindex.ai/en/stable/examples/vector_stores/elasticsearchindexdemo/).\n",
    "\n",
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0f51199e",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install -qU llama-index-vector-stores-elasticsearch llama-index-embeddings-huggingface llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d48af8e1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# import\n",
    "from llama_index.core import VectorStoreIndex, SimpleDirectoryReader\n",
    "from llama_index.vector_stores.elasticsearch import ElasticsearchStore\n",
    "from llama_index.core import StorageContext"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "374a148b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# set up OpenAI\n",
    "import os\n",
    "import getpass\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d96fb0d0",
   "metadata": {},
   "source": [
    "Download Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "06874a37",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2024-05-13 15:10:43 URL:https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt [75042/75042] -> \"data/paul_graham/paul_graham_essay.txt\" [1]\n"
     ]
    }
   ],
   "source": [
    "!mkdir -p 'data/paul_graham/'\n",
    "!wget -nv 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8965583f",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.embeddings.huggingface import HuggingFaceEmbedding\n",
    "from llama_index.core import Settings\n",
    "\n",
    "# define embedding function\n",
    "Settings.embed_model = HuggingFaceEmbedding(\n",
    "    model_name=\"BAAI/bge-small-en-v1.5\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "667f3cb3-ce18-48d5-b9aa-bfc1a1f0f0f6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# load documents\n",
    "documents = SimpleDirectoryReader(\"./data/paul_graham/\").load_data()\n",
    "\n",
    "# define index\n",
    "vector_store = ElasticsearchStore(\n",
    "    es_url=\"http://localhost:9200\",  # see Elasticsearch Vector Store for more authentication options\n",
    "    index_name=\"paul_graham_essay\",\n",
    ")\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    documents, storage_context=storage_context\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4d3658bd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The author worked on writing and programming outside of school. They wrote short stories and tried writing programs on an IBM 1401 computer. They also built a microcomputer kit and started programming on it, writing simple games and a word processor.\n"
     ]
    }
   ],
   "source": [
    "# Query Data\n",
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"What did the author do growing up?\")\n",
    "print(response)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  },
  "vscode": {
   "interpreter": {
    "hash": "0ac390d292208ca2380c85f5bce7ded36a7a25670a97c40b8009630eb36cb06e"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
